Three New Corpora at the Bavarian Archive for Speech Signals - and a First Step Towards Distributed Web-Based Recording

نویسندگان

  • Christoph Draxler
  • Florian Schiel
چکیده

The Bavarian Archive for Speech Signals has released three new speech corpora for both industrial and academic use: a) Hempels Sofa contains recordings of up to 60 seconds of non-scripted telephone speech, b) ZipTel is a corpus with telephone speech covering postal addresses and telephone numbers from a real world application, and c) RVG-J, an extension of the original Regional Variants of German corpus with juvenile speakers. All three corpora were transcribed orthographically according to the SpeechDat annotation guidelines using the WWWTranscribe annotation software. Recently, BAS has begun to investigate performing large-scale audio recordings via the web, and RVG-J has become the testbed for this type of recording.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The SmartWeb Corpora: Multimodal Access to the Web in Natural Environments

As a result from the German SmartWeb project three speech corpora, one of them multimodal, have been published by the Bavarian Archive for Speech Signals (BAS). They contain speech and video signals from human–machine interactions in real indoor and outdoor environments. The scenarios for these corpora are a typicial handheld PDA interaction (SHC), an interaction on a running motorcycle (SMC) a...

متن کامل

Phonemic Segmentation and Labelling using the MAUS Technique

We describe the pronunciation model of the automatic segmentation technique MAUS based on a data-driven Markov process and a new evaluation measure for phonemic transcripts relative symmetric accuracy; results are given for the MAUS segmentation and labelling on German dialog speech. MAUS is currently distributed as a freeware package by the Bavarian Archive for Speech Signals and will also be ...

متن کامل

Wikispeech - a content management system for speech databases

In this paper we describe WikiSpeech, a content management system for the web-based creation of speech databases for the development of spoken language technology and basic research. Its main features are full support for the typical recording, annotation and project administration workflow, easy editing of the speech content, plus a fully localizable user interface. For the creation of a new s...

متن کامل

Web-Based Speech Data Collection and Annotation

The WWW is a ubiquitous, mature communication infrastructure for business and scientific information interchange. Since 1997, the Bavarian Archive for SpeechSignals (BAS) has been developing and using web-based annotation tools for large-scale speech databases. Recently it has developed an application for recording speech via the WWW. Both the annotation and the recording tools are now integrat...

متن کامل

PercyConfigurator - Perception Experiments as a Service

PercyConfigurator is an experiment editor that eliminates the need for programming; the experiment definition and content are simply dropped onto the PercyConfigurator web page for interactive editing and testing. When the editing is done, the experiment definition and content are uploaded to the server. The server returns a link to the experiment which is then distributed to potential particip...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002